Smart logging of trace data for storage systems

ABSTRACT

An improved technique for storing trace data involves storing software operation debug trace information in a buffer memory rather than in a log file in the main memory, and after completion of the software operation either (1) deleting the debug trace information upon the successful completion of the operation, or (2) transferring the debug trace information to a log file memory upon a failure of the operation.

BACKGROUND

Complex electronic systems, such as data storage systems, typically runa large number of processes during a typical operation. Softwareutilized to execute these processes may be accordingly complex and largein scope. Developing software for electronic systems involves writingcode and then testing the operation of the code in the electronic systemto determine if the system performs as desired while running thesoftware.

Some complex electronic systems send process data to a management systemfor tracking and analysis. For example, management software such asUnisphere® by EMC Corporation of Hopkinton, Mass., tracks and analyzesthe data storage system's response to storage requests received overvarious time periods. In order to properly debug the software, theoperation of the electronic system is traced so that each step inproducing the final result is available for a software developer toanalyze. Conventional techniques of debugging software configured to runprocesses within complex systems involve storing trace data in a logfile within a management system. The trace data is then available to adeveloper should the need to debug the software arise, e.g., in responseto a software-related failure.

SUMMARY

Unfortunately, the conventional techniques of debugging software duringdevelopment phases requires a very large amount of memory to store allof the trace data in the log file, which increases cost and decreasesthe amount of time that the trace data can be stored. The tracinggenerally results in large amounts of trace data which must be stored ina log file, and includes low level information such as the values of allparameters needed for the software developer to analyze all possiblesoftware errors. Since every possible parameter value is stored at eachand every step of the program operation, the size of the log file may bevery large, and consequently the length of time that the trace data maybe economically stored may be short.

In contrast with the above-described conventional techniques whichproduce excessively large log files, improved techniques of debuggingsoftware configured to run processes within complex systems involvestoring software operation debug trace information temporarily in abuffer memory, and when the software operation is completed evaluatingthe success of the operation and transmitting the debug trace data todifferent locations depending upon the result. For example, thetemporarily stored trace data that accumulated in the buffer prior tothe completion of the software operation may be either (1) deleted fromthe buffer memory upon the successful completion of the operation, or(2) transferred to a log file memory configured to store errors,warnings, and informational messages upon a failure of the operation.For example, the processing of storage requests in a data storage systemmay be continuously monitored by a management computer. The managementcomputer receives data resulting from operation of the data storagesystem and generates various log data. The management computer will alsosend information that will become the debug data that represents everyparameter and function passed into and out of the storage system duringoperation, and well as every value of every state variable. While theerrors, warnings, and informational messages are stored in the log filememory, the trace data is first stored in a buffer until the managementsoftware deems the operation has been completed successfully or haseither not be completed within a certain time period or has beencompleted unsuccessfully and flags a problem. For example, if a storeoperation has successfully stored the desired data in the correctlocation of a memory and all the indications are within the properlimits, then the trace data in the buffer may be released or deleted.

Advantageously, the improved techniques reduce the amount of log filesize required by only storing information useful in failure analysis. Incertain cases the debug trace information may be set to have all of thetrace data accumulated in the buffer transmitted to the log file whetheror not the operation succeeded during a software development phase.After the completion of the software development phase, for example in aproduct release phase, the accumulated trace data may only betransmitted to the log file when the operation fails for failureanalysis purposes. The technique reduces the amount of log file memoryrequired to perform software development or operational phases by onlystoring information useful for failure analysis. Such improvedtechniques reduce log file size during the software development phaseand may be useful during the product release phase for improved failureanalysis capabilities.

Embodiments of the improved techniques include a computer programproduct having a non-transitory computer readable medium which stores aset of instructions for storing and managing trace data from anoperation performed in an electronic system. The set of instructionscausing a managing computer to perform a method, the method comprisingreceiving trace data resulting from an operation performed by anelectronic system in communication with the management computer, andstoring the trace data in a buffer memory in communication with themanagement computer. When the management computer receives notice that astatus of the operation is in a first state, the method includestransferring the trace data to a first location, and when the status ofthe operation is in a second state transferring the trace data to asecond location. The first state may indicate that the operation hasbeen completed in a state that has not achieved specified limits, andconsequently the operation has failed and the first location may be alog file memory where the trace data is stored more permanently than inthe buffer memory, which may be cleared after the transfer to the logfile memory. The second state may indicate that the operation has beencompleted in a state that has achieved specified limits, andconsequently the operation has been successful. The second location maybe a delete location where the trace data is dumped, and the buffermemory may be cleared for a next operation. The management computeranalyzes the trace data to determine if the operation was completedsuccessfully in part by searching the trace data to locate a softwareexception thrown by a processor of the electronic system, whichindicated that the operation did not successfully complete, and as aresult the trace data should be transferred to a log file for furtheranalysis.

Other embodiments include an electronic apparatus including a networkinterface, a buffer memory, a memory and processing circuitry coupled tothe network interface the memory and the buffer memory. The memorystoring instructions which, when carried out by the processingcircuitry, cause the processing circuitry to receive trace data from anelectronic operation in an electronic system in communication with theprocessing circuitry at the buffer memory. The instructions will causethe processing circuitry to determine the status of the operation andwhen in a first state indicating that the operation did not completesuccessfully transfer the trace data to a log file portion of thememory. When the status of the operation is in a second state indicatingthat the operation completed successfully deleting the trace datathereby reducing the log file memory capacity needed to store tracedata.

Other embodiments of the improved technique includes a method ofreducing memory requirements including receiving, at a processor, tracedata for an electronic operation, receiving information on the status ofthe electronic operation, transferring the trace data to a log filememory when the status indicate completion of the electronic operationoutside of specified limits, and deleting the trace data when the statusindicate completion of the electronic operation within the specifiedlimits.

Other embodiments of the improved techniques may be imagined and includeapparatus, device, system components and circuitry enabled to tracesoftware operations in electronic devices and either store trace datafor operations that have been completed outside of desired limits andmay have problems that will benefit from further analysis, or deletetrace data for operations that have completed within the desired limits,thus reducing memory space requirements for electronic systems. Althoughthe improved techniques have been described using an example of storagesoftware development and failure analysis, the techniques may also beapplied to any complex system, for example, process development andproduction monitoring for a manufacturing line such as a semiconductorfabrication facility. In such a manufacturing facility the productproduced is measured and the results recorded at hundreds of steps alongthe fabrication flow, and the data storage facility requirements arehigh and may be reduced by use of the improved techniques for smartlogging of trace data.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, features and advantages will beapparent from the following description of particular embodiments of theinvention, as illustrated in the accompanying drawings in which likereference characters refer to the same parts throughout the differentviews. The drawings are not necessarily to scale, emphasis instead beingplaced upon illustrating the principles of various embodiments of theinvention.

FIG. 1 is a line drawing illustrating an electronic environment wherethe improved technique may be implemented.

FIG. 2 is a line drawing illustrating an implementation of the improvedtechnique in a first condition.

FIG. 3 is a line drawing illustrating an implementation of the improvedtechnique in a second condition.

FIG. 4 is a flow chart illustrating the step of an exemplary operationusing the improved technique.

DETAILED DESCRIPTION

An improved technique for storing trace data involves storing softwareoperation debug trace data temporarily in a buffer memory rather thanstoring the trace data directly in a log file in the main memory. Aftercompletion of the software operation the technique will either (1)delete the debug trace data upon successful completion of the operation,or (2) transfer the debug trace data to a log file memory upon failureof the operation. For example, when a piece of new software is in adevelopment phase a data storage system operating on the newly developedsoftware may be continuously monitored by a computer configured tomanage software operations. The management computer receives dataresulting from development operation, generates various log data, andalso generates debug data that includes every parameter and functionpassed into and out of the storage system during operation, and everyvalue of every state variable. This trace data is used by the softwaredeveloper to improve the software, and is stored in a log file. However,the size of log file may become very large. In order to reduce the sizeof the log file in the development phase the improved technique storesthe trace data a buffer memory until the management software deems theoperation has been completed. The trace data in the buffer memory theneither transfers the trace data to the log file if the managementsoftware determines that the operation is outside of acceptable limits,or deletes the trace information if the management software determinesthat the operation is within the acceptable limits. The improvedtechnique reduces the log file size required by only storing informationuseful in failure analysis.

The debug trace information may alternatively be set to transmit allinformation to the log file whether or not the operation was completedwithin the specified limits during a software development phase, or thedebug trace data may be set to only transmit the debug trace data whenthe operation fails during a product release or an operational use phasefor use in failure analysis. The technique reduces the amount of logfile memory required to perform software development or to performfailure analysis in an operational phase, by only storing informationuseful for failure analysis. Such improved techniques reduce log filesize during both the software development phase as well as during theproduct release phase for improved failure analysis capability.

FIG. 1 illustrates an electronic environment 100 where the improvedtechnique may be implemented. Electronic environment 100 includes acomplex electronic system 108 and a memory 120 connected to users 102,104 and 106 by bidirectional communication media 112, 114 and 116respectively. The complex electronic system 108 may be a storageprocessor such as may be used in large memory systems. Users 102, 104and 106 may be using electronic devices such as computers, servers,personal computers, tablets, smart phones, commercial computer systems,financial computer systems and other easily imagined electronic systems.Users 102, 104 and 106 may, for example, transfer data either to or fromthe memory 120 via the storage processor 108 and bidirectionalcommunications line 118. The communications lines may be wired orwireless, use electrical, optical or RF signals, and may include anyform of communication capable of transmitting information betweenelectronic devices.

The storing and fetching of data between the users and the memory 120may be subject to errors and these errors may be more likely to occurduring development of new software programs. The activity of theelectronic environment 100 may be detected and analyzed by a manager122, which may have a separate set of communications lines or may beconnected to the existing communications line 118. The manager 122 maybe a management computer such as the Unisphere® system made by EMCCorporation of Hopkinton Mass. In this description the manager 122 isshown as having separate communications lines for greater simplicity ofdescription, but the improved techniques are not so limited. The manager122 can examine every parameter and every value of every state variableat each individual software step of an operation in the storageprocessor 108, for example, a store command from user 102 to record anitem of information in a location in the memory 120. The manager 122 canstore every detail of the store operation; determine whether or not theoperation is complete; and determine whether or not the operationcompleted without any errors. This data is called trace data.

The manager 122 includes a buffer memory in which the trace data for thestore operation is accumulated while the store operation occurs. Thebuffer memory may be included within the manager 122 or on a separatedevice. The manager 122 may be a separate device as shown, or it mightbe a portion of the storage processor 108 or of some other electronicdevice. The manager 122 includes memory containing acceptable limits forthe completion of the operations likely to occur in the environment 100.Using these limits the manager 122 determines whether or not theoperation was completed within the limits, (i.e., successfully completedor not). Many possible successful completion metrics may be imagined todetermine if a software operation has been completed properly, includingvarious programming language mechanisms, such as a C++ exception, orlogging facility calls such as logger.errorDetected, or a timeout limitfor total operational length.

The manager 122 may transfer the accumulated trace data for theoperation at the end of the determination of operation success, or themanager 122 may wait until enough operations have been completed toreach a limit of the memory capacity of the buffer memory. Transferringthe combined trace data of the operations at a single time with a singlewrite call is more efficient and results in lower overhead for thesystem. The increased efficiency of transferring accumulated trace datafor an operation in a single write rather than the prior method ofstoring the trace data in the log file in memory 120 as the traced datais gathered is another benefit of the improved technique.

After completion of the operation the manager 122 evaluates the tracedata and transfers the trace data to a selected location in memory 120,such as a log file, if the evaluation indicates that the operation didnot complete properly, for example, not within the specified limits.Thus problems with the software or hardware may be available for failureanalysis. Alternatively, the manager 122 may erase the trace data,effectively transferring the trace data to a dump location if theevaluation indicates that the operation was completed within thespecified limits. The manager 122 may also send a notice to a selectedlocation to identify selected completion states, or the manager maycollect data on the results of the all the operation successdeterminations over a period of time.

FIG. 2 is a line drawing illustrating an implementation of the improvedtechnique in a first condition in environment 200, with a storageprocessor 208 (used as an example of a complex electronic device) incommunication with a memory 220 via communication line 230, and with aplurality of users (not shown) as previously discussed. Monitoringstorage processor 208 via monitor communication line 228, an electronicdevice 202, for example the manager 122 previously discussed withreference to FIG. 1, collects information about the operations of thestorage processor 208. The electronic device 202 may collect informationsuch as error indications from the storage processor 208, which theelectronic device 202 transfers using an interface circuit 206 to alocation in memory 220, for example a log file, via communication signal210. In similar fashion warning signals and other information aretransferred by the electronic device 202 to the log file in memory 220.

Trace data from the operation occurring in the storage processor 208 maybe collected by a logger circuit 204 and transferred to a buffer memory224 via a data communication line 216. The buffer 224 is shown as aseparate device in this illustration for ease of description, but theimproved technique is not so limited, and the buffer may be located inthe electronic device 202 or anywhere else that is convenient. Thebuffer 224 accumulates the trace data as the operation proceeds asdiscussed previously until the electronic device 202 logger circuit 204sends the buffer 224 a operation complete signal via communication line218. Evaluation circuitry in the electronic device 202 or in the buffer224 determines whether or not the condition of the completed operationis within the specified limits. In the first condition shown in thisfigure the completed operation has been found to be within the specifiedlimits and is deemed to be successful and the trace data is erased, asindicated by the transferring of the trace data via communication line219 to the dump 226.

In this example of the improved technique it is assumed thatsuccessfully completed operations are not needed for failure analysisand thus may be deleted to save memory space, but the technique is notso limited and the electronic device 202 may be configured to save alltrace data for other easily imagined reasons.

FIG. 3 is a line drawing illustrating an implementation of the improvedtechnique in a second condition where the trace data accumulated in thebuffer 324 has been analyzed by the electronic device 302 and theoperation found to have been completed outside of the specified limitsand thus a failure that may be of interest in failure analysis. In thissituation the trace data is transferred to a log file in the memory 320via communication line 319. In other embodiments of the improvedtechnique the trace data may be erased from the buffer memory afterbeing transferred to the log file, or it may be transferred to anadditional location depending upon the results of the analysis, or itmay be data compressed and continue to be stored in the buffer 324.

FIG. 4 is a flow chart illustrating the steps of an example methodincluding the improved technique. At 402, the electronic device 202 ofFIG. 2 initiates the logger circuit 204 to record trace data of anoperation occurring in the storage processor 208. The operation may be aread request from a user for data stored in a memory 220.

At 404, the trace data accumulates in a buffer 224 until the operationis completed. At 406, the electronic device 202 performs an analysis onthe trace data in the buffer 224 to determine if the operation wascompleted within specified limits.

At 408, the electronic device 202 transfers the trace data to a memory220. At 410, the analysis shows that the operation was completed outsideof the specified limits and transfers the trace data to a log file,which may be located in memory 220. The method then ends at 414. If theanalysis shows that the operation was completed within the specifiedlimits the method may erase the trace data at step 412, and end at 414.

With such a technique storing trace data can include only storing tracedata of operations that are of interest in failure analysis. Thetechnique may be implemented in software which may be located in theelectronic device to be evaluated, such as a storage processor, or in amonitoring computer evaluating the operation of a storage system.

While various embodiments of the invention have been particularly shownand described, it will be understood by those skilled in the art thatvarious changes in form and details may be made therein withoutdeparting from the spirit and scope of the invention as defined by theappended claims.

What is claimed is:
 1. A method, executed by a processor, of reducingmemory requirements, comprising: receiving, at the processor, trace datafor an electronic operation performed by an electronic system incommunication with the processor; storing the trace data in a buffermemory in communication with the processor; receiving information on astatus of the electronic operation; transferring the trace data from thebuffer memory to a log file memory when the status indicates completionof the electronic operation outside of specified limits; and deletingthe trace data when the status indicates completion of the electronicoperation within the specified limits, wherein the method furthercomprises: analyzing the trace data, by the processor, to locate asoftware exception thrown by a processor of the electronic system, apresence of the software exception indicating that the operation did notcomplete successfully; storing the trace data in the log file memorywhen the software exception was located in the trace data; and deletingthe trace data when the software exception was not located in the tracedata.
 2. The method of claim 1 wherein receiving the trace data includesa separate buffer memory for accumulating the trace data until acompletion of the electronic operation is indicated by the processor,and transferring the trace data to the log file memory includestransferring all of the trace data in a single operation.
 3. The methodof claim 2 wherein transferring the trace data to the log file memory isdelayed until accumulating the trace data results in a data file thatreaches a predetermined size limit for the buffer memory.
 4. The methodof claim 2 wherein the data trace includes a debug trace for developingsoftware in a storage system.
 5. An electronic apparatus, comprising: anetwork interface; a buffer memory; a memory; and processing circuitrycoupled to the network interface, the memory and the buffer memory, thememory storing instructions which, when carried out by the processingcircuitry, cause the processing circuitry to: receive trace data from anelectronic operation in an electronic system in communication with theprocessing circuitry at the buffer memory; when a status of theelectronic operation is in a first state, transfer the trace data fromthe buffer memory to a first location; when the status of the electronicoperation is in a second state, transfer the trace data to a secondlocation different from the first location thereby reducing the log filememory capacity needed to store trace data, wherein first locationincludes a log file memory, and the processing circuitry caused totransfer the trace data to the second location is further caused todelete the trace data, wherein the first state indicates that theoperation did not complete successfully and the second state indicatesthat the operation completed successfully; and wherein the processingcircuitry further performs the steps of: analyzing the trace data tolocate a software exception thrown by a processor of the electronicsystem, a presence of the software exception indicating that theoperation did not complete successfully; storing the trace data in thelog file memory when the software exception was located in the tracedata; and deleting the trace data when the software exception was notlocated in the trace data.
 6. The electronic apparatus of claim 5further including the processing circuitry formatting the trace data inthe log file memory the same as a formatting in the buffer memory.
 7. Acomputer program product having a non-transitory computer readablemedium which stores instructions for managing trace data resulting fromoperations performed by electronic systems, the set of instructionscausing a management computer to perform a method, the methodcomprising: receiving trace data resulting from an operation performedby an electronic system in communication with the management computer;storing the trace data in a buffer memory in communication with themanagement computer; when the operation is in a first state,transferring the trace data from the buffer memory to a first location;and when the operation is in a second state, transferring the trace datato a second location apart from the first location, wherein transferringthe trace data to a first location includes transferring the trace datato a log file memory, and transferring the trace data to a secondlocation includes deleting the trace data, wherein the first stateindicates that the operation did not complete successfully and thesecond state indicates that the operation completed successfully; andwherein the method further comprises: analyzing the trace data to locatea software exception thrown by a processor of the electronic system, apresence of the software exception indicating that the operation did notcomplete successfully; storing the trace data in the log file memorywhen the software exception was located in the trace data; and deletingthe trace data when the software exception was not located in the tracedata.
 8. The computer program product of claim 7, further includingformatting the trace data in the log file memory the same as aformatting in the buffer memory.
 9. The computer program product ofclaim 7, wherein transferring the trace data to the log file memory isdelayed until a number of smaller operations are accumulated into agroup having a storage memory size that is smaller than a memory size ofthe buffer memory.
 10. The computer program product of claim 7, whereinstoring the trace data in the buffer memory further includes extending asize of the buffer memory for an operation including a trace data setthat exceeds a size of the buffer memory.
 11. The method of claim 7,wherein the operation is one of multiple operations performed by theelectronic system, and wherein transferring the trace data from thebuffer memory to the first location is performed after completion of theoperation.
 12. The method of claim 11, wherein the management computerincludes a logger circuit and evaluation circuitry, and wherein themethod further comprises: issuing, by the logger circuit, an operationcomplete signal upon completion of the operation; and generating, by theevaluation circuitry in response to the operation complete signal, anindication that the operation is in the first state.
 13. The method ofclaim 12, wherein the electronic system includes a storage processor ofa data storage system, and wherein the operation is a store operationperformed by the storage processor in response to a store commandreceived from an electronic device operated by a user.